Saturday, September 13, 2008

zfs-fuse 0.5.0 released

Hi,

After resyncing the ZFS code in zfs-fuse to OpenSolaris build 98, I have (finally!) decided to make a new release.

I know that going from 0.4.0_beta1 to 0.5.0 is not the most logical thing to do.

However, if you consider that 0.4.0_beta1 was released more than 18 months ago and if you consider all the improvements that have been made since then, it makes a little more sense :-)

(By the way, if you still running 0.4.0_beta1... please, please, please upgrade *now* - serious bugs have been fixed).

This release brings us up-to-date with ZFS pool version 13 (try "zpool upgrade -v").
You can read about any zfs-fuse specific fixes or enhancements in the CHANGES file, although I'm sure I missed a few things.

One thing to mention is that there's no development going on in zfs-fuse in terms of missing features (the ones in the STATUS file). At the moment you can only expect bug fixes and ZFS code updates.

For those of you who have not been following zfs-fuse development closely, please be assured that if you are using IDE, SATA or SCSI disks, you no longer need to disable the write caches of your devices (thanks to Eric Anopolsky).

However, that is only true if you use raw disks or partitions as your vdevs -- if you use LVM, EVMS or other devices (such as loop devices), the disks' write caches still should be manually disabled to avoid potential problems.

If you want peace of mind you can check your syslog right after zfs-fuse starts to do any I/O - zfs-fuse will notify you if it can't flush the write cache.

Enjoy!

Saturday, January 19, 2008

Status update

Some people have (quite understandably) requested me to post a status update, so here it goes:

The good news:

  • The project is not dead.
  • The code is being updated every month with new features and bug fixes from OpenSolaris.
  • Critical bugs, and especially any bug reports of corruption, will receive my full attention, so please report them if that happens to you. I am 100% committed to fix any such bugs, but I haven't received any such reports in quite a time. However, I'm not sure if this issue has been truly fixed simply because I've been unable to reproduce it.
  • I will make an effort to review and integrate any patches that I receive that improve zfs-fuse.
The bad news:
  • The beta release is quite old already, so I should definitely make a new release..
  • I haven't been able to work on improving zfs-fuse, except for the monthly updates and the occasional bug fixes.
  • I have a couple of features that I've started to work on but haven't had the time to finish. One of them is async I/O and the other one is automatic disabling of disk write caching with hdparm and sdparm.
That said, if you would like to use zfs-fuse without having to worry about your data safety, please do the following:

  • Use the latest development code, from the Mercurial trunk (very important)
  • Disable disk write caching on your disks with hdparm and sdparm (very important)
  • Create and import your pools with devices from /dev/disk/by-id (highly recommended)

I am reluctant to release a new version until these last 2 points are dealt with in a safe way, either by displaying flashing big red warnings along with setting the mixer volume to the maximum and playing sounds of sirens, or by making zfs-fuse handle it automatically (probably better).

Wednesday, September 12, 2007

Irony

Hi everyone,

I'm finally updating this blog to give you an update.

The ZFS-FUSE wiki and mercurial repositories are now running on a GeoWirx Xen VPS, so the long downtimes will be a thing of the past (knock knock). By the way, highly recommended service - those VPSs are good and cheap - excellent experience so far.

At the moment, a new port of ZFS-FUSE to 32-bit PowerPC is being worked on, with the kind help of Thomas Riddle (Sun engineer) and Matt Sealey (Genesi manager) and his developers. A few weeks ago I've also enabled direct I/O, which slowed performance, and I will soon implement async I/O to hopefully get it back.

As you may have noticed, progress on ZFS-FUSE has been slow. These past few months I've been finishing my degree, which has required a lot of time (only one exam left.. tomorrow!) and I've been on vacations.

What's perhaps more interesting is that back in June I was invited to work for Cluster File Systems on the Lustre file system. The idea is that Lustre will be using the DMU portion of the ZFS-FUSE port to Linux as its storage backend.

This means that the talented team of engineers who have pushed Lustre into new levels of scalability will be looking to do the same with the ZFS DMU!
Of course, this doesn't necessarily mean that you will see a huge boost in performance in ZFS-FUSE, but I think many improvements are still possible. And now that I've started working for CFS I'm also learning new things by the minute :)

Oh, and expect to see the corruption issue fixed soon (kernel patch might be required, unfortunately).

What's also interesting is that just today Sun, who as you all know created ZFS, has announced that it has acquired CFS. So, if everything goes well and they still decide to keep me... next week I will be a Sun engineer!

Anyway, you may have noticed that I've removed the PayPal links (thanks to everyone who donated!), since I don't think it will make much of a difference anymore.

Well, in the midst of all this, I hope I can still dedicate some time to work on ZFS-FUSE (after all, I am also using it). And of course.. patches are always welcome!

Wednesday, June 20, 2007

Site down

Hi everyone,

Recently an article was posted on LinuxWorld about ZFS-FUSE. See also the Slashdot article.

In case you're wondering why the ZFS-FUSE website is down, it's because I have been without Internet access since Monday. I never thought I'd say this again but I'm using a 33Kb dial-up connection right now (I think this might be slower than this)..

As a consequence, the Mercurial repositories are also down, and I might not receive some of your emails..

On another note, there has been recent reports of corruption in Raid-Z pools, so please backup your data before trying ZFS-FUSE!

Monday, April 16, 2007

ZFS in the Linux kernel?

Due to the recent surge of interest in porting ZFS to the Linux kernel (if you are in the mood to read dozens of messages, see this thread, the follow-up, plus this one and one more), I'd like to offer my view on things.

I have a feeling most Linux kernel hackers (or at least those that talk about ZFS on linux-kernel) don't really know how ZFS works or what it can do. The best example is perhaps this message from Rik van Riel.

Well, first of all, ZFS doesn't have/need fsck (what?! are you nuts??). This is because ZFS checks and repairs the filesystem online and on-the-fly, as it is being used. And when it can't repair, it will pinpoint exactly which files and which bytes in those files were corrupted. You might think this is complex or expensive, but it's really simple and beautiful actually. These slides explain this and a lot more, so please read them carefully.

The great thing is that ZFS can also repair metadata on-the-fly even on ZFS pools that don't have any inherent redundancy (in other words, this also works for single disks). This is due to a feature called ditto blocks, which basically keeps multiple copies of metadata dynamically spread through the disk. Oh and now this works for data too, so you can configure your filesystem with important files to keep 2 or even 3 copies of data on the disk (this is despite any inherent pool redundancy).

ZFS has a lot of other nice things too, like cheap and instantaneous snapshots and clones, optional compression, variable sector sizes, easy management, .. I really think interested people should read these slides and try zfs-fuse.

Now regarding a ZFS port to the Linux kernel:

1) As for technical difficulty, I don't think it is a problem. I don't know Linux VFS internals, but if I was able to port it so easily to FUSE, it certainly can be done. I don't think this is a problem at all.

2) As for the license, well.. that is a real problem. I'm a big believer in FSF's ideals, but in this case I think the GPLv2 is preventing progress. It would be a big plus to have Linux benefit from a fully open-source, useful piece of functionality with 6 years of development behind it.

Of course, as Adrian Bunk put it, I don't think it'll be possible to have 10,000 (live and dead) people to agree on a licensing change.

One option would be to reimplement ZFS (or a comparable filesystem) from scratch. I don't think this is feasible, first because it would require a huge effort and several years to reach the same level of robustness as ZFS has right now. And second because Sun has filed more than 50 patents on ZFS. Even if Sun never uses those patents against Linux, some people might see it as a risk (in the United States).

The only way I'm seeing ZFS on the Linux kernel is to convince Sun to dual-license ZFS under the GPL and the CDDL. Some people might say Sun would never do this, but Sun has been very open to the open-source community recently. And in fact, Sun's ZFS FAQ initially had an answer saying Sun was considering a ZFS port to Linux (not to FUSE, that was my idea ;).

Finally I'd like to debunk a couple of myths about zfs-fuse:

1) In terms of features, zfs-fuse will certainly be comparable to a ZFS kernel implementation (and in fact, most of it already works). The only thing that can't be done is to store swap on a ZFS pool, due to the way ZFS works. You can see the STATUS file for more details about implemented features.
2) As for performance, well.. zfs-fuse is slow right now, but it will certainly improve. I haven't even started to seriously look at performance. And FUSE-based filesystems can have comparable performance to kernel filesystems, as the bottleneck is usually the disk(s), not the CPU.

Tuesday, March 06, 2007

ZFS for Linux Beta 1

Hi everyone,

I'm happy to announce the release of ZFS on FUSE/Linux 0.4.0 beta 1.

Even though this is a beta release, it should be more stable than your typical beta filesystem. The main problems in this release are (lack of) performance and high memory usage with some load patterns.

As always, be sure to read the README and the STATUS files.

So, what's the future for zfs-fuse?

The plan is to implement all the missing features marked for 0.4.0 in the STATUS file and then release beta2. After that I intend to focus on performance, release rc1 and after extensive testing release 0.4.0 final.

I wish to thank all the users who have been testing and helping this project.
I also want to give a special thanks to Phil Worrall, Chris Samuel, David Plumpton and especially Roland (devzero) for all the patches, bug reports, tests and suggestions :)

Thursday, January 18, 2007

News

Hi,

I haven't been able to make much progress these last couple of weeks due to school projects and exams, however there have been a few bug fixes and performance improvements. Thank you to everyone who reported bugs and sent patches (and suggestions)!

I expect to release alpha2 in the next few days, after some newer functionality is implemented.

Interestingly, Patrick Verner has been busy adding support for zfs-fuse in his Parted Magic LiveCD/USB project, which is now officially the first Linux distribution supporting ZFS!

Seems very cool, so if you want to test zfs-fuse but don't feel like compiling it yourself, give it a shot ;)

Oh, and I've created a zfs-fuse discussion group, following the suggestion of Miguel Filipe :)

By the way - if you encounter any unexpected problem using zfs-fuse, please report a bug or post a message to the zfs-fuse discussion group, otherwise I can't fix it..

Tuesday, December 26, 2006

First alpha of ZFS on FUSE with write support

Ladies (?) and gentlemen, the first preview of ZFS on FUSE/Linux with full write support is finally here!

You can consider it my (late) Christmas gift for the Linux community ;)

Don't forget this is an alpha-quality release. Testing has been very limited.

Performance sucks right now, but should improve before 0.4.0 final, when a multi-threaded event loop and kernel caching support are working (both of these should be easy to implement, FUSE provides the kernel caching).

For more information, see the README and the STATUS file for working/not working features. Download here.

Let me know how it works, and don't forget to report bugs!

Friday, December 15, 2006

Read-only support for ZFS on Linux

I know it has been a loong time since my last post (sorry!), but today I'm very excited to bring you zfs-fuse 0.3.0 which is able to mount ZFS filesystems in read-only mode :)

Current status:
  • It is possible to create and destroy ZFS pools, filesystems and snapshots.
  • It is possible to use disks (any block device, actually) and files as virtual devices (vdevs).
  • It is possible to use any vdev configuration supported by the original ZFS implementation. This includes striping (RAID-0), mirroring (RAID-1), RAID-Z and RAID-Z2.
  • It is possible to change properties of filesystems.
  • It is possible to mount ZFS filesystems, but you can only read files or directories, you can not create/modify/remove files or directories yet.
  • ZIL replay is not implemented yet.
  • It is not possible to mount snapshots.
  • It is not possible to use 'zfs send/recv'.
  • ACLs and extended attributes do not work.
  • There is no support for ZVols.
  • It's buggy and probably has a few memory leaks :p
If you want to test it, just download it and follow the README (don't forget to read the prerequisites).

A few notes:

  • Even though you can't write to filesystems, the pools are opened in read-write mode. There are bugs and they can possibly corrupt your pools, so don't use zfs-fuse on important files!
  • There's no point in running benchmarks, since it is still highly unoptimized.
  • You cannot write to ZFS filesystems yet, so the best you can do right now is populate a filesystem in Solaris and then mount it in Linux. I recommend you create your zpools on files (since it's easier to switch between Linux and Solaris), but you can also use it directly on block devices.
  • I recommend you use EVMS if you use ZFS pools on block devices, since it places all of them in /dev/evms - makes it easier to import pools.
  • If you create your zpools in Solaris directly on whole disks, it will create an EFI label, so to properly mount it on Linux you'll need GPT/EFI partition support configured in the kernel (I think most x86 and amd64 kernels don't have it enabled, so you must compile the kernel yourself). Since my USB disk has died, and I'm still waiting for a replacement, I can't properly test this yet. The last time I tried I had some difficulty getting it to work, but I think I was able to do it using EVMS.
  • In order to import zpools on block devices, you'll need to do 'zpool import -d /dev'. Be careful since at the moment zpool will try to open and find ZFS pools on every device in /dev! If you're using EVMS, use /dev/evms instead.
The project is progressing at a fast pace since last week, when I did some major code restructuring and finished uncommenting most of the original ZPL code :)

And I am still highly confused about vnode lifetimes, so expect some bugs and probably some memory leaks until I sort it out eventually..

Enjoy!

Tuesday, October 03, 2006

First VFS operation working

After 3 afternoons of work during my recovery period and a couple of hours today, I got the first VFS operation to work! :)

So.. if you compile the latest development code from the Mercurial repository, you can now use 'df' on ZFS filesystems in Linux (but I don't recommend you try it on filesystems you don't want to corrupt) :p

It was a lot of work since I had to make zfs_mount() work, which depends on a lot of other code (I even had to import the range locks code).

The VFS is quite complex -- I still don't have a firm grasp of the whole VFS and vnode allocation/deallocation/reference counting, so my code is very messy, and there are still a lot of bugs :p

I also got a little disappointed because FUSE doesn't support remounting, so you won't be able to change any mount property on mounted filesystems (like for example, 'zfs set readonly=on pool') - you'll have to unmount and then mount it again..

Next step is to fix bugs/clean-up code and then implement readdir and stat (in order to use 'ls').

Wednesday, September 13, 2006

What's new

So what's new about zfs-fuse?
Unfortunately not much :p

Since my last post I've been basically enjoying what was left of my summer vacations (sorry :p). School started this week, and I've managed to schedule 2 class-free days per week, which is good for this project ;)

In the mean time, I've posted some patches to zfs-code, started to work on the FUSE part of the project (finally!), and migrated the code repository from Subversion to Mercurial (you can access it here).

Mercurial will greatly simplify my code management. It's an awesome CMS, and I've always wanted to learn how to use it, so I thought this would be a great time. Mercurial still has a few limitations -- for one, it doesn't handle symlinks, so I had to create a few rules in SCons to automatically create them but it's nonetheless a great improvement over Subversion, even for a single developer.

Anyway, next week I'll be having surgery (it's sleep apnea related, nothing serious), and I'll need to be home for about 1 week recovering. I'm seriously hoping to take advantage of that time to finally get zfs-fuse to mount filesystems and do some basic operations, so stay tuned :)

Tuesday, August 22, 2006

ZFS being ported to the FreeBSD kernel

These last few days Pawel Dawidek has been working on porting ZFS to the FreeBSD kernel.

And he's made tremendous progress! He can already mount filesystems, list directories, create files/directories, change permissions.. The best of all, he did that in only 10 days!

Wow, now that's impressive.

Sunday, August 20, 2006

zfs-fuse version 0.2.0 released

Hi,

If you were jealous of my previous post, now you can play with ZFS on Linux too ;)

Just head over to the download page and follow the README.

Note that it's still not possible to mount ZFS filesystems, so you won't be able to read or write files, however you can already manage ZFS pools and filesystems.

As always, you should report any bugs or problems to rcorreia at wizy dot org, or by using the bug database.

Have fun!

Saturday, August 19, 2006

ZFS on Linux status

Hi everyone,

A lot of time has passed since my last post - sorry about that. I simply hadn't made any visible progress. My free time has been less than I expected and the needed time for this part of the project is a bit more than I originally thought it would be ;)

Anyway... on to the news.

zfs_ioctl.c and libzpool-kernel are finally compiling, linking and partially working.

It's not possible to mount ZFS filesystems yet, however a few commands already work:

$ uname -a
Linux wizy 2.6.15-26-amd64-generic #1 SMP PREEMPT Thu Aug 3 02:52:35 UTC 2006 x86_64 GNU/Linux

$ ~/zfs/trunk/zfs-fuse/zfs-fuse &

$ ./zpool status
no pools available

$ dd if=/dev/zero of=/tmp/test1 bs=1M count=100
$ dd if=/dev/zero of=/tmp/test2 bs=1M count=100
$ dd if=/dev/zero of=/tmp/test3 bs=1M count=100

$ ./zpool create pool raidz /tmp/test1 /tmp/test2 /tmp/test3
cannot mount '/pool': failed to create mountpoint

$ ./zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
pool 286M 87K 286M 0% ONLINE -

$ ./zpool scrub pool

$ ./zpool status
pool: pool
state: ONLINE
scrub: scrub completed with 0 errors on Sat Aug 19 03:45:45 2006
config:

NAME STATE READ WRITE CKSUM
pool ONLINE 0 0 0
raidz1 ONLINE 0 0 0
/tmp/test1 ONLINE 0 0 0
/tmp/test2 ONLINE 0 0 0
/tmp/test3 ONLINE 0 0 0

errors: No known data errors

$ dd if=/dev/urandom of=/tmp/test2 bs=1M count=30

$ ./zpool scrub

$ ./zpool status
pool: pool
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-4J
scrub: scrub completed with 0 errors on Sat Aug 19 03:47:37 2006
config:

NAME STATE READ WRITE CKSUM
pool DEGRADED 0 0 0
raidz1 DEGRADED 0 0 0
/tmp/test1 ONLINE 0 0 0
/tmp/test2 UNAVAIL 0 0 0 corrupted data
/tmp/test3 ONLINE 0 0 0

errors: No known data errors

$ ./zfs list
NAME USED AVAIL REFER MOUNTPOINT
pool 60.6K 158M 2.00K /pool

$ ./zfs create pool/test
cannot mount '/pool/test': failed to create mountpoint
filesystem successfully created, but not mounted

$ ./zfs list
NAME USED AVAIL REFER MOUNTPOINT
pool 66.6K 158M 2.00K /pool
pool/test 2.00K 158M 2.00K /pool/test


There is still a major glitch in the zfs_ioctl <-> zpool/zfs communication, so I haven't uploaded the latest code to the SVN repository just yet, but I definitely expect to fix it tomorrow.

There's also an interesting bit of code that I implemented in order to help me debug zfs-on-fuse (also still not uploaded to SVN) that I'll talk about in my next post ;)

Stay tuned.

Friday, July 21, 2006

Status update

Woohoo, exams are over!! :)

Finally I'm going to have time to work on the project, yay :))

--

Today I got zfs_ioctl.c to compile (not linking yet, I've got to get libzpool to compile in the simulated kernel context, which probably means copy/pasting most of zfs_context.h to the correct libsolkerncompat headers).
However, even after zfs_ioctl links with libzpool-kernel, I still have to code some additional functionality in order to get the zfs and zpool commands working.

--

In other news, this week I've got a free 3-month Safari account, thanks to Google (and O'Reilly), which will be quite useful. It's incredible how these guys are always surprising me :D

After a little browsing of the available books, I've found one which has already proved itself to be helpful: Solaris Internals - Core Kernel Components. Although it was written at a time when only Solaris 7 was available, the VFS chapter content was still mostly accurate. I only wish it was more detailed.. :)

So, even with the help of the book (and the OpenSolaris OpenGrok browser, which I've been using since the beginning -- amazing, I already can't live without it), I've had some difficulty understanding some Solaris vfs/vnode interfaces, but I think I got it mostly right.

Of course, even if I haven't, I'm sure my kind and dedicated testers will help me find all the bugs, eventually.. ;)

Wednesday, July 12, 2006

FUSE implications on ZFS

Hi,

I know it's been almost 2 weeks since my last post, but I'm still in my university exam season. Anyway, after my last exam next wednesday (the 19th), I'll be free to work on this project full-time ;)

Today I've received a few interesting questions from Jim Thompson that I (and him) think you should know about.

"(snip) ...in reading the ZFS mailing list I've seen a couple of mentions that ZFS turns off the write cache on the disks it manages. There may be other low-level disk control issues in ZFS as well. Is it possible for ZFS to accomplish these low-level operations when running from user code in FUSE?

Secondly, how does FUSE+ZFS ensure that the linux kernel's disk cache doesn't interfere with ZFS's writes to the disk. When ZFS thinks it's written a block to disk, is there any possibility that the block is actually cached inside the linux kernel's list of dirty disk pages?"


Actually, regarding the write cache, ZFS on (Open)Solaris enables it if you give it a whole disk. The problem about disks's write caches is actually the reordering of the writes. ZFS must have a guarantee that all the writes in the current transaction are flushed to the disk platter before writing the uberblock, in case power fails.

This will be accomplished in zfs-fuse by calling fsync(2) on file vdevs and ioctl(BLKFLSBUF) on block devices at the appropriate times (which ZFS already does), in order to flush all writes to disk. The (Linux) kernel guarantees that this will happen (on sane disks).

This is the only low-level interaction with the disks that ZFS cares about.

If your hard disk is broken/misbehaving so that it ignores the command to flush the write cache, you can always disable the write cache with hdparm(8)/sdparm(8)/blktool(8), like you had to do with any other journaling filesystem.

I don't recommend disabling the write cache unless you know your disk misbehaves, because it's actually a good thing - it improves performance and your disk will last longer.

However, there's another thing that worries me a little more, and that I'll have to look into it later on.

The issue is with the Linux kernel read cache. I don't know exactly at what level ZFS caches nodes/blocks, so if I'm not careful, there could be cache duplication, which will manifest itself in wasted memory usage.
FUSE has a few mount options that allows one to control the kernel cache behaviour - direct_io, kernel_cache and auto_cache.

Actually, I don't know what will be better - disabling the kernel cache or disabling the ZFS cache (or portions of it).
I'll try to investigate this issue when the time comes :)

Friday, June 30, 2006

Why ZFS is needed even in desktops and laptops

Personally, I can't use my computer unless I know I have reliable software and hardware. I think I have achieved that goal mostly, since my computer pretty much never crashes or loses data (apart from the occasional application bugs).

Now, even though I use a reliable journaling filesystem (XFS) in my Linux system, I like to do a filesystem consistency check once in a while (usually not less than once every 3 months), which can only happen in those rare times when I need (or want) to reboot. Today was one of those days.

And here are the results: xfs_repair.txt. I ended up with 90 files and empty dirs in lost+found. Why did this happen? It could be a hardware problem - either the hard disk, the SATA cable, the SATA controller or even the power supply; or a software bug - either in the SATA driver, the XFS code or somewhere else in the Linux kernel.

I actually suspect this is a hardware problem. This particular machine, back when I was using a different SATA controller and a different hard disk, had the very annoying problem of not flushing the disk write cache on reboots. This caused *a lot* of the problems you see above in the xfs_repair log. I even tried MS Windows, which would chkdsk on every other reboot. So the problem was definitely hardware related. Even though I never fixed the problem, fortunately I never lost an important file!

Now, after the hard disk died, I bought a new one and changed SATA controller (my motherboard has 2), just to be on the safe side. But, well, as you can see above, something's still definitely not working correctly.

This is one of the reasons I need ZFS. I don't want to lose or end up with misteriously corrupted files. I want to see how often data is corrupted. I want to see if corruption only happens after a reboot (which means it's a disk write cache flush problem), or if it happens while the system is running (I can't fsck XFS filesystems while they're being used). Of course, I want to do this in order to diagnose the problem and fix it.

And even if the hardware only corrupts data once in a blue moon, I need my filesystem to properly report a checksum error and retry (or return an error), instead of returning corrupted data. Basically, I want a reliable computer..

Monday, June 26, 2006

zfs and zpool programs successfully compile and link.

Beware -- long post.

Sorry for the lack of updates recently, I've been kind of busy with other stuff.. ;)

I have good news. As you can see from the title, the zfs and the zpool programs now successfully compile and link :)

There are 2 known functionality losses: Linux doesn't seem to have a libdiskmgt equivalent and porting it seems rather complicated if not almost impossible, so there is no "device in use" detection. If anyone knows a good way to solve this, I'm all ears :)

The other loss is the "whole disk" support. This must eventually be solved, since it's rather common in Solaris to dedicate a whole disk as a zpool vdev. I believe this can be circumvented for now by creating a partition that uses the whole disk, and create the zpool vdev using the partition device.

The specific problem is that ZFS uses EFI labels in disks. I only found one working EFI library for Linux, but the API is different from the OpenSolaris implementation. Once again, porting the EFI functionality from OpenSolaris proved to be difficult.

--

Unfortunately the zfs and zpool programs don't do anything useful yet.

In the original implementation, they use the ioctl(2) system call through the /dev/zfs device to communicate with the kernel. Since this is a userlevel implementation, there will be no /dev/zfs.

In the zfs-fuse implementation, instead of using ioctl(), we communicate through a UNIX domain (local) socket which is created in /tmp/.zfs-unix/. So in order to make these commands actually do something, there must be a zfs-fuse process that answers the messages sent from zpool and zfs.

My plan to make zfs-fuse work is to take advantage of as much code from the original implementation as possible. Yes, this also means I want to use the original ZPL code. It seems to be the most reliable way to do it, and perhaps also the easiest one :)

In order to do that, I've created a libsolkerncompat library that will implement/translate the necessary OpenSolaris kernel code to make the ZPL work. This library will also be necessary in order to use the original zfs_ioctl.c implementation, since it uses some kernel VFS (Virtual File System) operations, along with other things.

This will also take some time to get it working, since a new zfs_context.h must be created or (more likely) the current zfs_context.h must be factored out to libsolkerncompat.

So, in a way, I can say I'm now in phase 3.5, since I'm actually working on the zfs-fuse process (which includes the necessary bits to make the ZPL code work), but I'm still not in phase 4, since zpool and zfs won't work until zfs_ioctl.c is ported.

Thursday, June 22, 2006

Massive cleanup, libzfs almost compiling

Today I did a massive cleanup of the source code.

As of now, there's a new library called libsolcompat that implements or translates all necessary Solaris-specific functions into glibc functions (yes, FreeBSD will require some work here).

This massive cleanup also means that all #includes will be exactly the same between the original source files and the Linux port! :)

This was achieved by overriding some glibc system headers with a local header, with the help of a gcc-specific preprocessor directive called #include_next. This directive allows one to include the overriden system header, while adding some functionality to it.

You can see a trivial example of this in the libsolcompat string.h, where a Solaris-specific function called strlcpy() was added.

With this new file structure in place, it is now much easier to port new files.

This also means libzfs is almost fully compiling. There are only a few functions left to be ported, which don't seem too difficult. However, getting it to work correctly will still require some work, as there are some differences between a real filesystem and a FUSE filesystem (I don't think mount(2) will work, ioctls must be replaced by UNIX sockets, etc), and between Solaris and Linux, obviously (some /dev/dsk references are still in place, etc).

Tuesday, June 20, 2006

Phase 3 has begun

Hi,

Today I finally started working on Phase 3.

I already got libuutil to compile (zpool needs it), but in the process I stumbled upon a very subtle problem. The problem is that, when porting OpenSolaris code to Linux, the -Wunknown-pragmas flag is dangerous (I was using it to ignore #pragma idents).

Why is it dangerous, you ask? Because it ignores #pragma init and #pragma fini. Then why do the OpenSolaris developers use -Wunknown-pragmas without problems?
Because when gcc is compiling to a Solaris target, it recognizes those pragmas. Not in Linux, though.

Well what does this mean? It means that to be on the safe side (I really don't want to track down an obscure bug related to a #pragma init that I missed) I had to remove that flag from the main SConstruct file, and I had to change 144 source files just to remove #pragma idents... On the bright side, I only have to do this once, since all future code integrations from OpenSolaris are patched automatically.

There's a related problem I had to deal with earlier. The gcc assembler in Solaris recognizes the slash (/) symbol as a way to start comments. Unfortunately, in Linux it doesn't, which means I had to remove all comments from the ported assembly files..

Anyway, in other news, Ernst Rholicek has contributed a Gentoo Ebuild for all lazy Gentooers out there who want to help testing ;)

That's it for now -- as usual, I'll keep you posted on my progress.