Jump to content
    1. Welcome to GTAForums!

    1. GTANet.com

    1. GTA Online

      1. The Criminal Enterprises
      2. Updates
      3. Find Lobbies & Players
      4. Guides & Strategies
      5. Vehicles
      6. Content Creator
      7. Help & Support
    2. Red Dead Online

      1. Blood Money
      2. Frontier Pursuits
      3. Find Lobbies & Outlaws
      4. Help & Support
    3. Crews

    1. Grand Theft Auto Series

      1. Bugs*
      2. St. Andrews Cathedral
    2. GTA VI

    3. GTA V

      1. Guides & Strategies
      2. Help & Support
    4. GTA IV

      1. The Lost and Damned
      2. The Ballad of Gay Tony
      3. Guides & Strategies
      4. Help & Support
    5. GTA San Andreas

      1. Classic GTA SA
      2. Guides & Strategies
      3. Help & Support
    6. GTA Vice City

      1. Classic GTA VC
      2. Guides & Strategies
      3. Help & Support
    7. GTA III

      1. Classic GTA III
      2. Guides & Strategies
      3. Help & Support
    8. Portable Games

      1. GTA Chinatown Wars
      2. GTA Vice City Stories
      3. GTA Liberty City Stories
    9. Top-Down Games

      1. GTA Advance
      2. GTA 2
      3. GTA
    1. Red Dead Redemption 2

      1. PC
      2. Help & Support
    2. Red Dead Redemption

    1. GTA Mods

      1. GTA V
      2. GTA IV
      3. GTA III, VC & SA
      4. Tutorials
    2. Red Dead Mods

      1. Documentation
    3. Mod Showroom

      1. Scripts & Plugins
      2. Maps
      3. Total Conversions
      4. Vehicles
      5. Textures
      6. Characters
      7. Tools
      8. Other
      9. Workshop
    4. Featured Mods

      1. Design Your Own Mission
      2. OpenIV
      3. GTA: Underground
      4. GTA: Liberty City
      5. GTA: State of Liberty
    1. Rockstar Games

    2. Rockstar Collectors

    1. Off-Topic

      1. General Chat
      2. Gaming
      3. Technology
      4. Movies & TV
      5. Music
      6. Sports
      7. Vehicles
    2. Expression

      1. Graphics / Visual Arts
      2. GFX Requests & Tutorials
      3. Writers' Discussion
      4. Debates & Discussion
    1. Announcements

    2. Support

    3. Suggestions

*DO NOT* SHARE MEDIA OR LINKS TO LEAKED COPYRIGHTED MATERIAL. Discussion is allowed.

Making an HTML parser for a website


David.
 Share

Recommended Posts

Hey guys,

 

Let's say I'm trying to parse data from a website that has no JSON, RSS feeds. For instance, this forum website. If I were to make an app for this website would it be wise to parse raw HTML or is there a better way?

 

 

Thanks!

Link to comment
Share on other sites

It will depend what you can get from the site.

 

Does the site have a print version or a basic/mobile template you can request? If you can get those then it will send more of the data you need and less of the added sh*t.

 

Also try asking the folk to set up an RSS feed. I'd imagine they would if it is of benefit to them.

Link to comment
Share on other sites

If you want to parse HTML document and read neccessary informations, then there are few decent HTML parsers.

For C# i have used Html Agility Pack.

 

Using RegEx is not good way to extract informations from HTML document, HTML document may be poorly written.

You can't write a complete HTML parser, unless you have solid programming skills.

Link to comment
Share on other sites

What language are you using? The best ones I know of are

 

(Python) Beautiful Soup

(PHP) PHP Dom

(Javascript) jQuery

 

If you clarify what language you're after I'd be happy to provide a few more hints :-)

Link to comment
Share on other sites

 

It will depend what you can get from the site.

 

Does the site have a print version or a basic/mobile template you can request? If you can get those then it will send more of the data you need and less of the added sh*t.

 

Also try asking the folk to set up an RSS feed. I'd imagine they would if it is of benefit to them.

Well, for GTAforums.com there is no mobile version and I don't think we'll have an RSS feed setup anytime soon.

 

@fastman92

 

since I'm coding in Java, I've found that TagSoup and HTML Cleaner are pretty good.

 

I guess what I'm asking is this a really good idea? Just parsing the raw HTML documents to make a mobile app?

Link to comment
Share on other sites

It will depend what you can get from the site.

 

Does the site have a print version or a basic/mobile template you can request? If you can get those then it will send more of the data you need and less of the added sh*t.

 

Also try asking the folk to set up an RSS feed. I'd imagine they would if it is of benefit to them.

Well, for GTAforums.com there is no mobile version and I don't think we'll have an RSS feed setup anytime soon.

 

@fastman92

 

since I'm coding in Java, I've found that TagSoup and HTML Cleaner are pretty good.

 

I guess what I'm asking is this a really good idea? Just parsing the raw HTML documents to make a mobile app?

In case that server doesn't provide any data except pages, then it's necessary to parse HTML pages and get informations.

You can find information by parsing a HTML with library and extracting them by selecting item by certain ID/class/name/tag or different way.

Link to comment
Share on other sites

 

I guess what I'm asking is this a really good idea? Just parsing the raw HTML documents to make a mobile app?

Purely from a support point of view I'd try to find another option. Generally speaking if the site changes their layout in any way your app will break.

 

What site is it you're looking to app-ify? If it was indeed GTAF your best bet would be the printable version of each page.

 

For example this topic is

http://www.gtaforums.com/index.php?showtopic=558931

 

You'd take the showtopic value and use it for the t value here

http://www.gtaforums.com/index.php?act=Pri...rinter&t=558931

 

Which is more likely to remain the same layout for a long time.

 

 

Something else to watch out for however is that in this case the printable page shows the entire topic. Your app may not last too long if it becomes popular due to the amount of load it could put the site under - and depending what the site's hosting plan is how much bandwidth it chomps down.

 

I'd definitely recommend seeking permission from the site first, they may even have a better option for you to use you may not know about

 

 

Link to comment
Share on other sites

Thanks for all the help guys!

 

 

I guess I'll try asking Mr. Tank in a mail or something if there are any additional options. Otherwise I'll just go ahead with the printable version like eggburt suggested.

 

I'm rather new to Android development, and I thought it would be a fun project to try out-making an app for GTAF.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • 1 User Currently Viewing
    0 members, 0 Anonymous, 1 Guest

×
×
  • Create New...

Important Information

By using GTAForums.com, you agree to our Terms of Use and Privacy Policy.