,----[ "Mohammed Riyaz" p_mdriyaz@fastmail.fm ] | Hi, | | I have written this document of what you spoke in IIT. I am | attaching a copy of it. This document will be put on the ilugc | websever. So if you do not like any part of it or want something | changed, let me know. | | Thank you, | Mohammed Riyaz P. | | P.S: it was a great session. :) `----
As I promised, you can download a copy of my presentation and example code at
http://www.gnu-india.org/gnu/Hacking-GNU.pdf http://www.gnu-india.org/gnu/Hacking-GNU.sxi http://www.gnu-india.org/gnu/data-server.tgz
OK, here is a slightly revised version of your minutes (actually documentation :) ======================================================================== Hacking GNU/HURD ================ as a part of GLV@ILUGC
Speaker: Anand Babu Email: ab (at) gnu.org.in Date: 05 FEB 2005 Location: csd 320, Tenet seminar hall, IITM.
For all those who couldn't make it, you did miss a lot. Cheer up though, i will try my best to bridge that gap.
Brief Introduction: -------------------
The Hurd is a collection of servers that run on the Mach micro kernel to implement file systems, network protocols, file access control, and other features that are implemented by the monolithic Unix kernel or similar kernels (such as Linux). GNU Mach 1.x was derived from CMU Mach 3. There is also GNU Mach 2.0 branch based on Oskit from Utah (Oskit has Mach 4 code). initially developed from MACH 3. L4 (another micro kernel) is currently being developed and GNU Hurd hasn't booted off L4 yet. With L4 lacking the basic framework, AB recommended hacking GNU MACH over L4. Moreover getting a working GNU Hurd can be much faster with GNU MACH as most of the ground work has already been done. What GNU Hurd lacks as of now is mainly device drivers support and performance optimization.
AB also emphasized on the need to get GNU Hurd working (it already works fine, i mean up to industry standard) in at least two years, in order to get in par with Linux kernel ;)
SNIP: L4 tasks manages their resource themselves unlike Linux kernel.
Though named micro kernel, they agent necessarily small in size. GNU Mach is big because of the Linux device drivers in it (more about this later).
Drivers in User Space Vs Drivers in Kernel Space: ------------------------------------------------- AB spoke about the possibility of running device drivers in user space unlike Linus kernel where device drivers run in kernel space. Though he recommends hacking drivers in kernel space to get them ready initially. Mach tasks communicate over IPC and the advantages of user-space kernel surpasses the over-head of abstraction, Although performance doesn't suffer that much. Reason being IPC is essentially an abstraction with "mach_mesg" system call interrupt at the heart. (They are not like packets transferred between two socket applications). If the device drivers are made to run in user space, gdb could be run on the device driver to debug it.
SNIP: L4 doesn't copy/queue the messages and is totally synchronous. In L4 the designers are planning to use User space drivers.
Micro kernel Vs Monolithic Kernel: --------------------------------- No it wasn't a flame war ;)!!
Though Linus calls Linux modular, it isn't actually modular because at run time the kernel runs as one big program( it is called modular because of the modules aspect of Linux). The disadvantage of this being, as the kernel gets bigger it is going to be more difficult to maintain it. On the other hand, the micro kernel as such is small. The modules are in user space, and are separate programs in its own space, which helps in maintaining them.
Important concepts discussed about MACH: ---------------------------------------- -Threads -Tasks -Ports -Message & Message queue.
Threads and tasks are exactly what we think they are. A task is like a container of threads, ports rights... The new and interesting things were ports, messages and message passing. Threads are the basic unit of execution. Task can have one or more threads. How ever Task are not like Unix processes. They do not have pid, gid ...
Ports: ******
Unlike the ports we are familiar with, ports in MACH are the portals through which different tasks communicate with each other. The messages to different tasks are sent and received through ports. Ports are message queues with some properties associated to it (like message count).
Port rights: ************ These rights which decide whether you can send/receive messages to/from a task(through the port) or not.
Different rights: Send right - right to send to a port. Receive right - right to receive messages from a port. Send once right - this right is revoked after being used once. Port Set - Collection of receive rights.
Similar to Linux file descriptors for a program that cannot be used by another program, the ports of task are protected by the kernel. The port rights are transferred through messages. Though the tasks can decide on the rights, it is the kernel which actually does the transfer of rights.
Port Names pace: *************** Port Names pace is a structure maintained by the kernel for each task which contains the different rights. Each of the send once rights have a unique entry in the names pace and hence have a unique port name(discussed below). Similarly each of the receive right has a unique port name. Each port-set will have a unique port name. Send rights and receive rights to same ports have same port name. Remember a receive right cannot exist inside a port set and out-side as well.
Port Name: ********** They appear as numbers like file descriptors. Port names are an index of port-rights into the port name space.
Port Set: ********* Port set was released with CMU Mach 3. It is a collection of receive rights for a task. You can let the kernel listen on the entire range of receive-rights in a port-set and notify the task when ever message is ready. (Similar to "select" system call).
Messages: ********* Messages consist of a mach message header, and data part. The data part in turn has data type field, count field and a data field.
eg. data type - ins count - 10 data - 0..9
as guessed an array of 10 int's. :)
How ports are handled in MACH? ------------------------------ In GNU Mach for a task to communicate with another task, it has to be done through ports. More precisely, it needs to have send rights to the other port. So when a task is created in MACH, the kernel creates port called task port (similarly thread port for threads) and the send rights for that port are placed in the task's task structure. Similarly two other ports are created, namely bootstrap port and exception port.
The task in turn calls a routine mask_task_self(), which provides the send rights to the task port.
Other basic requirements for a task such as contacting the file system server(more about servers later) are taken care of by inheriting the environment ports, which are created during GNU Hurd initialization.
SNIP: Look at task.c and ipp_tt.c Line 86 of ipp_tt.c deals with the above paragraph.
MiG: ---- MiG is the Mach 3.0 interface generator, as maintained by the GNU Hurd developers for the GNU project.
The interface generator produces stub code from interface definition (.defs) files. The stub code makes it easy to implement and use Mach interfaces as remote procedure calls (RPC).
Generally a .defs file is written which contains the functions to be implemented and is compiled with MIG. The output is two files, a server part and client part. The server part contains the function prototype. The function definition's are to be filled in by the developer to suit his requirements. The client part contains the routines to call the functions implemented by the server.
So the actually message passing part is implemented by MIG.
REFER TO data-server.c and data-client.c example.
The Requirement- Device Drivers: -------------------------------- Shantanu Goel took Linux (1.3.35) device drivers for block,SCSI, PCI and ISA and got it working with CMU Mach. Advantage being no changes were required for the device drivers source. The wrapper took care of initialization, kernel memory allocation, I/O blocking. Currently GNU Mach has 2.0, 2.2 drivers of Linux kernel.
Now the requirement is to port the Linux 2.6 kernel device drivers to GNU Mach. The emulation will produce a performance drop of few microseconds.
The GNU Hurd: -------------
Servers: ******** In The GNU Hurd you have servers, eg. filesystem server, authserver TCP/IP stack, block drivers...etc. Communication (IPC) interface is defined by the corresponding MIG .defs files. Each of these servers takes care of specialized tasks and as a whole implements a POSIX system. In future GNU Hurd will also support distributed model called "collectives".
Translators: ************ Translators are hooks in the filesystem which in turn link to a task. eg. You already have ftpfs, httpfs file systems, where you can mount a remote FTP or HTTP server locally and run tar or grep on them.
$settans /root/.mbox /hurd/pop3fs --server=mail.gnu.org --user=....
to mount a remote POP3 connection as a local mbox file.
Once the translator is set you could use the normal file system commands like cat, ls , grep etc on /tmp/ftp and it will behave like a local file system.
Similarly translators could be written for gzip, http ... the list ends with your creativity. A whole list of libraries are available for this (eg. libdiskfs libnetfs ..)
Two type of translators:
*Active translators - these are lost with reboot, and showtrans does not work on them. *Passive translators.
CASE STUDY: ~~~~~~~~~~~ The famous /dev/null in Linux is implemented as a translator in HURD. If you run ps you can see /hurd/null running as a task.
This is hooked to /dev/null as a translator.
If you have understood all this, then you should be wondering how a task (which acts a translator) is able to understand cat, grep, ls ...because i too did and AB explained it beautifully which takes us to the next topic.
POSIX implementation on HURD: ----------------------------- Welcome to real world!!! or rather How deep is the rabbit hole??..:)
Before we get into this.. there is a function that needs to be discussed. dirlookup() This function returns send rights (read important concepts in Mach) to particular task. (similar to a DNS server which returns a ip for the url).
The libc calls like open, read, write .. in turn run a dirlookup() to get the rights from the required server. eg. if there is a regular file /tmp/foo a open on this would request rights from the filesystem server, but if /tmp/foo is a translator linked with foobar the the rights would be returned for foobar.
Now both the filesystem server and foobar will have to implement the same set of functions, eg. io_read, io_stat etc.
How is this done is you might ask.
Remember the .defs file ??(read MIG above) It is the developer who fills the function definition's (go back and read MIG once more if in doubt). so in foobar for io_read, i can do a socket connect, or anything i like.
so when you run cat, you call open, which does a dirlookup and returns the send right(either to a translator task or a server) and then open in turn calls io_open, io_read.. on their server.
and we already know how to write a translator!!! :).. even though you missed the meet. LUCK HUH!! :)
Cool tool librpci: ------------------ I dont really remember much about this(my brain was supercharged by now 4.5 hrs of HURD :)), but this tool will allow you to do a whole lot of things like stealing the task port, recreating the errors. eg if a server seems buggy to you, you could take a dump, then run librpci with this dump, librpci will simulate the previous situation with the help of the dump file. This should recreate the errors making the debugging easier.
Closing notes: -------------- If you have installed GNU/Hurd, sshd will not run because /dev/urandom is not yet implemented in HURD. So you could write a binary which returns some thing random (or the same thing every time :) ) and create a symlink to it as /dev/urandom. You will have a sshd running fine.
Alright, so you didn't attend the meet or you attended it and slept through out, but at least you have read this document and come this far. That gives more meaning to my effort in typing this document. Thank you.
Mohammed Riyaz P. (HAPPY HACKING :) (quoting RMS))
==============================================================================
Here is some more info, I wrote. Usually you take care of this after you finish the installation and login in for the first time.
SWAP and CDROM --------------- Also don't forget to add SWAP entries in your /etc/fstab after installation completes.
You need to create devices before you use them. If you have a swap partition say hd0s2 (hda2) and cdrom as hd2 (hdc), then # cd /dev # ./MAKEDEV hd0s2 # ./MAKEDEV hd2
and add these two lines to /etc/fstab
/dev/hd0s2 none swap sw 0 0 /dev/hd2 /cdrom iso9660 ro 0 0
# swapon -a
APT ---- If your network card works, add this to your /etc/apt/sources.list
deb http://ftp.gnuab.org/debian unreleased main deb http://ftp.debian.org/debian unstable main
Or if you have setup to use CDROM instead, do this deb file:/cdrom/debian unstable main contrib local non-US/main non-US/contrib
NETWORK -------- Setup networking like this: (Choose IP addresses appropriately)
# settrans -fgap /servers/socket/2 /hurd/pfinet -i eth0 \ -a 192.168.1.54 -g 192.168.1.1 -m 255.255.255.0 # echo "nameserver 202.54.15.1" >> /etc/resolv.conf # ping www.gnu.org
SSH ---- ssh installation will fail, because of missing /dev/urandom. Before you proceed with ssh installation, temporarily create a symbolic link of some binary file in place of urandom. This is just a dirty hack..
# ln -s /bin/bash /dev/urandom
CONSOLE -------- I am comfortable with "screen" package, except I re-map C-a prefix with
Happy Hacking,